-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix various issues that existed in the rotate_db_snapshots DAG #2158
Conversation
It doesn't seem like there's actually any way to configure the API versions used by Airflow. It totally ignores additional settings for anything other than S3, as far as I can tell, when configuring the STS client. If @AetherUnbound and @stacimc can confirm there isn't anything obvious I'm doing wrong that is causing this to occur locally, I'll open an issue with Airflow to ask for help/see if there's a bug here or if there's a way to explicitly configure the API version for STS (if indeed that's really what's going on here). The "missing parameter" message confuses me because the STS error doesn't appear to be related to that. AWS claims it only emits missing parameter errors when a parameter is missing (seems obvious) but there are no missing parameters, at least not at the level of our code. |
If you're trying to run this locally, it's likely attempting to use the local dev AWS account (which interacts with The reason we wouldn't want to change the base AWS connection info locally is because that would mean that all logging files and any ingestion that gets run locally would also be sent to production. We haven't had to do a whole lot of direct-to-AWS connection work yet beyond S3 so we don't have the best dev setup flow for this. It seems like setting up the |
More sensible Airflow variable names to reduce confusion of what is being referenced Use the RdsHook to standardise the usage of AWS_RDS_CONN_ID for all RDS operations and simplify connection method Update unit tests Only delete snapshots managed by the Airflow DAG (ignore automatic snapshots and manually created snapshots that do not have the DAG-specific prefix)
hook_params
templatingc07749c
to
ded9ff1
Compare
# This cannot be pulled in the DAG itself because ``hook_params`` | ||
# on the operators provided by the amazon provider is not templated | ||
RDS_REGION = Variable.get("CATALOG_RDS_REGION", "us-east-1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder - I didn't need to supply a region in #2099 when I specify it in the Airflow connection parameter: AIRFLOW_CONN_AWS_RDS=aws://<id>:<secret>@?region_name=us-east-1
We really want to try and avoid Variable.get
s in DAG files since they get run (unnecessarily) every time the DAGs are parsed. Are we able to remove this if we define the region using the conn ID now that we're using a separate RDS connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look great, and I was able to test this locally as well and confirm the new snapshot was created and the old one deleted! I have one note for improving the sensor, otherwise this is good to go 🚀
@@ -87,16 +107,15 @@ def rotate_db_snapshots(): | |||
db_snapshot_identifier=snapshot_id, | |||
# This is the default for ``target_statuses`` but making it explicit is clearer | |||
target_statuses=["available"], | |||
hook_params=hook_params, | |||
aws_conn_id=AWS_RDS_CONN_ID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for missing this in previous reviews - this should have the mode=reschedule
parameter set as well so it doesn't take up a worker slot while waiting for the snapshot. (in general this reduces overhead for Airflow). Unfortunately Airflow doesn't have a way to set this as the default globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wish there was an easy thing like the eslint "no-restricted-syntax" rule for Python to encode these sorts of things. It's so tedious to try to remember all the dozens of little rules and best practices and for reviewers to have to try to catch them. Not to mention when the reviewer isn't someone who is so familiar with Airflow best practices!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same as I was typing this out 😞 Way way easier to remember that way. We might be able to implement custom ruff linting checks perhaps, I'll try to look into it a bit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently not yet: astral-sh/ruff#283
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would allow us to do this: https://github.com/hchasestevens/bellybutton
I remember now that I opened an issue for it: #1317
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Tested with staging and it worked perfectly.
# Other manual snapshots may exist; we only want to automatically manage | ||
# ones this DAG created | ||
snapshots = [ | ||
snapshot | ||
for snapshot in snapshots | ||
if snapshot["DBSnapshotIdentifier"].startswith( | ||
AIRFLOW_MANAGED_SNAPSHOT_ID_PREFIX | ||
) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice detail!
Fixes
Originally this started as a PR to fix
hook_params
not being templated in the AWS operators, but it turned into much more.Original issue being fixed: If you have access to production Airflow, here are the logs:
https://airflow.openverse.engineering/dags/rotate_db_snapshots/grid?search=rotate_db_snapshots&dag_run_id=manual__2023-05-22T03%3A56%3A33.707669%2B00%3A00&task_id=create_snapshot&tab=logs
Log output copied:
Description
Summary of fixes:
Variable.get
to work aroundhook_params
not being a templated field on the AWS operatorsAWS_RDS_CONN_ID
everywhere in this DAG.Testing Instructions
Run the DAG locally against the staging API database by setting
CATALOG_RDS_DB_IDENTIFIER
todev-openverse-db
. Create a new AWS connection via the Airflow UI namedaws_rds_conn_id
. SetCATALOG_RDS_SNAPSHOTS_TO_RETAIN
to 2 and check against the RDS snapshots dashboard here: https://us-east-1.console.aws.amazon.com/rds/home?region=us-east-1#database:id=dev-openverse-db;is-cluster=false;tab=maintenance-and-backupsYou should see a new snapshot get created. While the DAG waits for the snapshot to become available, you'll see an additional snapshot in the list with the prefix from the DAG. After the snapshot becomes available, you should see the oldest of the previously existing snapshots get deleted.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin